AITopics | Westman Region

Collaborating Authors

Westman Region

Intelligent Design 4.0: Paradigm Evolution Toward the Agentic AI Era

Jiang, Shuo, Xie, Min, Chen, Frank Youhua, Ma, Jian, Luo, Jianxi

arXiv.org Artificial IntelligenceNov-25-2025

Research and practice in Intelligent Design (ID) have significantly enhanced engineering innovation, efficiency, quality, and productivity over recent decades, fundamentally reshaping how engineering designers think, behave, and interact with design processes. The recent emergence of Foundation Models (FMs), particularly Large Language Models (LLMs), has demonstrated general knowledge-based reasoning capabilities, and open new avenues for further transformation in engineering design. In this context, this paper introduces Intelligent Design 4.0 (ID 4.0) as an emerging paradigm empowered by foundation model-based agentic AI systems. We review the historical evolution of ID across four distinct stages: rule-based expert systems, task-specific machine learning models, large-scale foundation AI models, and the recent emerging paradigm of foundation model-based multi-agent collaboration. We propose an ontological framework for ID 4.0 and discuss its potential to support end-to-end automation of engineering design processes through coordinated, autonomous multi-agent-based systems. Furthermore, we discuss challenges and opportunities of ID 4.0, including perspectives on data foundations, agent collaboration mechanisms, and the formulation of design problems and objectives. In sum, these insights provide a foundation for advancing Intelligent Design toward greater adaptivity, autonomy, and effectiveness in addressing the growing complexity of engineering design.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1115/1.4070438

2506.09755

Country:

Asia > China > Hong Kong (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models

Kyaw, Alexander Htet, Gupta, Richa, Shah, Dhruv, Sinha, Anoop, Mathewson, Kory, Pender, Stefanie, Chitta, Sachin, Koga, Yotto, Ahmed, Faez, Sass, Lawrence, Davis, Randall

arXiv.org Artificial IntelligenceNov-25-2025

Advances in 3D generative AI have enabled the creation of physical objects from text prompts, but challenges remain in creating objects involving multiple component types. We present a pipeline that integrates 3D generative AI with vision-language models (VLMs) to enable the robotic assembly of multi-component objects from natural language. Our method leverages VLMs for zero-shot, multi-modal reasoning about geometry and functionality to decompose AI-generated meshes into multi-component 3D models using predefined structural and panel components. We demonstrate that a VLM is capable of determining which mesh regions need panel components in addition to structural components, based on the object's geometry and functionality. Evaluation across test objects shows that users preferred the VLM-generated assignments 90.6% of the time, compared to 59.4% for rule-based and 2.5% for random assignment. Lastly, the system allows users to refine component assignments through conversational feedback, enabling greater human control and agency in making physical objects with generative AI and robotics.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.02162

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

VideoCAD: A Dataset and Model for Learning Long-Horizon 3D CAD UI Interactions from Video

Man, Brandon, Nehme, Ghadi, Alam, Md Ferdous, Ahmed, Faez

arXiv.org Artificial IntelligenceNov-11-2025

Computer-Aided Design (CAD) is a time-consuming and complex process, requiring precise, long-horizon user interactions with intricate 3D interfaces. While recent advances in AI-driven user interface (UI) agents show promise, most existing datasets and methods focus on short, low-complexity tasks in mobile or web applications, failing to capture the demands of professional engineering tools. In this work, we introduce VideoCAD, the first attempt to model UI interactions for precision engineering tasks. Specifically, VideoCAD is a large-scale synthetic dataset consisting of over 41K annotated video recordings of CAD operations, generated using an automated framework for collecting high-fidelity UI action data from human-made CAD designs. Compared to existing datasets, VideoCAD offers an order-of-magnitude increase in complexity for real-world engineering UI tasks, with time horizons up to 20x longer than those in other datasets. We show two important downstream applications of VideoCAD: (1) learning UI interactions from professional 3D CAD tools for precision tasks and (2) a visual question-answering (VQA) benchmark designed to evaluate multimodal large language models (LLMs) on spatial reasoning and video understanding. To learn the UI interactions, we propose VideoCADFormer, a state-of-the-art model for learning CAD interactions directly from video, which outperforms existing behavior cloning baselines. Both VideoCADFormer and the VQA benchmark derived from VideoCAD reveal key challenges in the current state of video-based UI understanding, including the need for precise action grounding, multi-modal and spatial reasoning, and long-horizon dependencies.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2505.24838

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada > Manitoba > Westman Region > Brandon (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology > Software (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Agentic Large Language Models for Conceptual Systems Engineering and Design

Massoudi, Soheyl, Fuge, Mark

arXiv.org Artificial IntelligenceNov-4-2025

Early-stage engineering design involves complex, iterative reasoning, yet existing large language model (LLM) workflows struggle to maintain task continuity and generate executable models. We evaluate whether a structured multi-agent system (MAS) can more effectively manage requirements extraction, functional decomposition, and simulator code generation than a simpler two-agent system (2AS). The target application is a solar-powered water filtration system as described in a cahier des charges. We introduce the Design-State Graph (DSG), a JSON-serializable representation that bundles requirements, physical embodiments, and Python-based physics models into graph nodes. A nine-role MAS iteratively builds and refines the DSG, while the 2AS collapses the process to a Generator-Reflector loop. Both systems run a total of 60 experiments (2 LLMs - Llama 3.3 70B vs reasoning-distilled DeepSeek R1 70B x 2 agent configurations x 3 temperatures x 5 seeds). We report a JSON validity, requirement coverage, embodiment presence, code compatibility, workflow completion, runtime, and graph size. Across all runs, both MAS and 2AS maintained perfect JSON integrity and embodiment tagging. Requirement coverage remained minimal (less than 20%). Code compatibility peaked at 100% under specific 2AS settings but averaged below 50% for MAS. Only the reasoning-distilled model reliably flagged workflow completion. Powered by DeepSeek R1 70B, the MAS generated more granular DSGs (average 5-6 nodes) whereas 2AS mode-collapsed. Structured multi-agent orchestration enhanced design detail. Reasoning-distilled LLM improved completion rates, yet low requirements and fidelity gaps in coding persisted.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1115/DETC2025-168856

2507.08619

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Manitoba > Westman Region > Brandon (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Energy > Renewable > Solar (1.00)
Water & Waste Management > Water Management (0.93)
Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

GRAID: Enhancing Spatial Reasoning of VLMs Through High-Fidelity Data Generation

Elmaaroufi, Karim, Lai, Liheng, Svegliato, Justin, Bai, Yutong, Seshia, Sanjit A., Zaharia, Matei

arXiv.org Artificial IntelligenceOct-29-2025

Vision Language Models (VLMs) achieve strong performance on many vision-language tasks but often struggle with spatial reasoning--a prerequisite for many applications. Empirically, we find that a dataset produced by a current training data generation pipeline has a 57.6% human validation rate. These rates stem from current limitations: single-image 3D reconstruction introduces cascading modeling errors and requires wide answer tolerances, while caption-based methods require hyper-detailed annotations and suffer from generative hallucinations. We present GRAID, built on the key insight that qualitative spatial relationships can be reliably determined from 2D geometric primitives alone. By operating exclusively on 2D bounding boxes from standard object detectors, GRAID avoids both 3D reconstruction errors and generative hallucinations, resulting in datasets that are of higher quality than existing tools that produce similar datasets as validated by human evaluations. We apply our framework to the BDD100k, NuImages, and Waymo datasets, generating over 8.5 million high-quality VQA pairs creating questions spanning spatial relations, counting, ranking, and size comparisons. We evaluate one of the datasets and find it achieves 91.16% human-validated accuracy--compared to 57.6% on a dataset generated by recent work. Critically, we demonstrate that when trained on GRAID data, models learn spatial reasoning concepts that generalize: models fine-tuned on 6 question types improve on over 10 held-out types, with accuracy gains of 47.5% on BDD and 37.9% on NuImages for Llama 3.2B 11B, and when trained on all questions types, achieve improvements on several existing benchmarks such as BLINK. The GRAID framework, datasets, and additional information can be found on our project page. Vision Language Models (VLMs) have already shown promise in a wide variety of applications, such as medical diagnosis Jin et al. (2024), biology (Maruf et al., 2025), and engineering design (Pi-card et al., 2025). However, despite this promise, a key failure mode of VLMs is that they are poor spatial reasoners, that is, they struggle to understand how objects are located in space and the spatial relationships between them. For example, in medical image analysis, Jin et al. (2024) found that VLMs were unable to recognize that skin lesions shown at different angles were the same pathology. Similarly, in robotics, Wang et al. (2025) found that without integrating explicit spatial relationships, VLMs were unable to produce high-level, executable robotic task plans.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.22118

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > Manitoba > Westman Region > Brandon (0.04)
Asia > Singapore (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Literature-Grounded Novelty Assessment of Scientific Ideas

Shahid, Simra, Radensky, Marissa, Fok, Raymond, Siangliulue, Pao, Weld, Daniel S., Hope, Tom

arXiv.org Artificial IntelligenceJun-30-2025

Automated scientific idea generation systems have made remarkable progress, yet the automatic evaluation of idea novelty remains a critical and underexplored challenge. Manual evaluation of novelty through literature review is labor-intensive, prone to error due to subjectivity, and impractical at scale. To address these issues, we propose the Idea Novelty Checker, an LLM-based retrieval-augmented generation (RAG) framework that leverages a two-stage retrieve-then-rerank approach. The Idea Novelty Checker first collects a broad set of relevant papers using keyword and snippet-based retrieval, then refines this collection through embedding-based filtering followed by facet-based LLM re-ranking. It incorporates expert-labeled examples to guide the system in comparing papers for novelty evaluation and in generating literature-grounded reasoning. Our extensive experiments demonstrate that our novelty checker achieves approximately 13% higher agreement than existing approaches. Ablation studies further showcases the importance of the facet-based re-ranker in identifying the most relevant literature for novelty evaluation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.22026

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Large Language Models for Computer-Aided Design: A Survey

Zhang, Licheng, Le, Bach, Akhtar, Naveed, Lam, Siew-Kei, Ngo, Tuan

arXiv.org Artificial IntelligenceMay-14-2025

Large Language Models (LLMs) have seen rapid advancements in recent years, with models like ChatGPT and DeepSeek, showcasing their remarkable capabilities across diverse domains. While substantial research has been conducted on LLMs in various fields, a comprehensive review focusing on their integration with Computer-Aided Design (CAD) remains notably absent. CAD is the industry standard for 3D modeling and plays a vital role in the design and development of products across different industries. As the complexity of modern designs increases, the potential for LLMs to enhance and streamline CAD workflows presents an exciting frontier. This article presents the first systematic survey exploring the intersection of LLMs and CAD. We begin by outlining the industrial significance of CAD, highlighting the need for AI-driven innovation. Next, we provide a detailed overview of the foundation of LLMs. We also examine both closed-source LLMs as well as publicly available models. The core of this review focuses on the various applications of LLMs in CAD, providing a taxonomy of six key areas where these models are making considerable impact. Finally, we propose several promising future directions for further advancements, which offer vast opportunities for innovation and are poised to shape the future of CAD technology. Github: https://github.com/lichengzhanguom/LLMs-CAD-Survey-Taxonomy

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.08137

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
Asia > Singapore (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(3 more...)

Genre:

Research Report > Promising Solution (0.46)
Overview > Innovation (0.34)

Industry: Information Technology > Software (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From Idea to CAD: A Language Model-Driven Multi-Agent System for Collaborative Design

Ocker, Felix, Menzel, Stefan, Sadik, Ahmed, Rios, Thiago

arXiv.org Artificial IntelligenceMar-6-2025

In modern product development, Computer Aided Design and Engineering (CAD/E) plays a key role to turn innovative ideas and visions into tangible and manufacturable designs. Digital 2D and 3D geometry representations of objects on different levels of granularity are required in various intermediate development steps, for example aesthetic discussions, design quality evaluations based on simulation tools, and design feasibility checks. For these steps, development teams include various roles such as requirement engineers, style designers, Computer-Aided Design (CAD) experts, simulation domain experts, and quality assurance teams who create a product cooperatively. Stakeholders in these roles utilize software tools to implement digital representations of products, also referred to as digital twins. This process receives an increasing amount of support in the form of Artificial Intelligence (AI) methods. For example, data science methods provide efficient ways to improve the problem understanding, e.g., by calculating design sensitivities towards a certain performance aspect [Gräning and Sendhoff, 2014], or displaying the distribution of design variations in the solution space using clustering [Lanfermann et al., 2020].

agent, specification, vlm, (15 more...)

arXiv.org Artificial Intelligence

2503.04417

Country:

North America > Canada > Manitoba > Westman Region > Brandon (0.04)
Europe > Germany (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Software (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Reimagining Personal Data: Unlocking the Potential of AI-Generated Images in Personal Data Meaning-Making

Park, Soobin, Kim, Hankyung, Lim, Youn-kyung

arXiv.org Artificial IntelligenceFeb-26-2025

Image-generative AI provides new opportunities to transform personal data into alternative visual forms. In this paper, we illustrate the potential of AI-generated images in facilitating meaningful engagement with personal data. In a formative autobiographical design study, we explored the design and use of AI-generated images derived from personal data. Informed by this study, we designed a web-based application as a probe that represents personal data through generative images utilizing Open AI's GPT-4 model and DALL-E 3. We then conducted a 21-day diary study and interviews using the probe with 16 participants to investigate users' in-depth experiences with images generated by AI in everyday lives. Our findings reveal new qualities of experiences in users' engagement with data, highlighting how participants constructed personal meaning from their data through imagination and speculation on AI-generated images. We conclude by discussing the potential and concerns of leveraging image-generative AI for personal data meaning-making.

participant, personal data, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2502.18853

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.05)
Asia > South Korea > Daejeon > Daejeon (0.04)
Asia > Middle East > Jordan (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Personal (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Heterogeneous Federated Learning Systems for Time-Series Power Consumption Prediction with Multi-Head Embedding Mechanism

Syu, Jia-Hao, Lin, Jerry Chun-Wei, Srivastava, Gautam, Yun, Unil

arXiv.org Artificial IntelligenceJan-21-2025

Time-series prediction is increasingly popular in a variety of applications, such as smart factories and smart transportation. Researchers have used various techniques to predict power consumption, but existing models lack discussion of collaborative learning and privacy issues among multiple clients. To address these issues, we propose Multi-Head Heterogeneous Federated Learning (MHHFL) systems that consist of multiple head networks, which independently act as carriers for federated learning. In the federated period, each head network is embedded into 2-dimensional vectors and shared with the centralized source pool. MHHFL then selects appropriate source networks and blends the head networks as knowledge transfer in federated learning. The experimental results show that the proposed MHHFL systems significantly outperform the benchmark and state-of-the-art systems and reduce the prediction error by 24.9% to 94.1%. The ablation studies demonstrate the effectiveness of the proposed mechanisms in the MHHFL (head network embedding and selection mechanisms), which significantly outperforms traditional federated average and random transfer.

artificial intelligence, federated learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.12136

Country:

Europe > Poland (0.04)
Asia > Taiwan (0.04)
North America > Canada > Manitoba > Westman Region > Brandon (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback